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TITLE OF THE INVENTION 
IMAGE RECOGNITION METHOD AND APPARATUS 

BACKGROUND OF THE INVENTION 
This application is based on Japanese Patent 
5 Application No, 10-371332 , filed December 25, 1998, the 

contents of which are incorporated herein by reference. 

The present invention relates to an image 
recording apparatus and method for recognizing the 
shape and/or movement of an image on the basis of 
10 a captured range image or range image stream. 

Conventionally, upon recognizing three-dimensional 
motions such as motions of the hand, face, and the like 
of a person, the object to be recognized such as the 
hand, face, or the like is sensed from its front side 
15 using an image sensing apparatus such as a video camera 

or the like. Then, recognition is made by estimating 
three-dimensional motion using limited changes in two- 
dimensional (without any depth information) motion that 
appears in the sensed image, and various other kinds of 
20 knowledge. 

Some recognition methods will be explained blow. 
The first method estimates motion using feature 
points of the object to be recognized. In this method, 
some feature points are set in advance on the object to 
25 be recognized, and motion is estimated using a change 

in positional relationship between the feature points. 
For example, upon recognizing a horizontal shake 
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(horizontal rotation) of the face, several feature 
points of the face are set at the eyes, nose, and the 
like, and a clockwise shake of the face is estimated 
from changes, e.g., the feature points at the positions 
5 of the eyes have moved horizontally, the spacing 

between the feature points at the two eyes has 
decreased, the feature point at the right eye has 
disappeared (since the right eye has moved to a 
position that cannot be seen from the camera), and so 

10 forth upon movement of the face. 

However, when this method is used, markers and the 
like must be pasted at the positions of the feature 
points of the face to stably obtain the corresponding 
points in a camera image, and the environment that can 

15 use this method is limited. in some cases, no markers 

are used. However, in such case, feature points cannot 
be stably extracted, and much computation cost is 
required to obtain feature points. 

Another method estimates motion by obtaining 

2 0 changes in motion moment. This method exploits the 

fact that when a hand is rotated about a vertical 
axis, the forward projection area of the hand in the 
horizontal direction changes dramatically, but it does 
not change much in the vertical direction . In such 

25 case, rotation of the hand about the vertical axis is 

estimated solely because the motion moment of the hand 
in the horizontal direction changes considerably. 
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This method can estimate three-dimensional motion. 
However, since the shape of the object that can be 
used in recognition is limited, and different two- 
dimensional motions can hardly be distinguished from 
5 each other, recognition errors readily occur. 

Also, a method of estimating motion from the 
geometric shape of the object to be recognized is known . 
For example, when three-dimensional motion of a dice 
is to be recognized, it is estimated that the dice has 

10 been cast when the one pip is seen via the camera at 

a given timing, and then it changes to the three pips. 
Since this method exploits knowledge about geometric 
stereoscopic information of the object to be recognized, 
three-dimensional motion can be relatively reliably 

15 estimated. However, objects that can be recognized are 

limited. In addition, geometric knowledge about that 
object is required, resulting in poor versatility. 

Also, various other methods are available. 
However, in these methods, since three-dimensional 

20 motion is estimated from an image that has only two- 

dimensional information, it is difficult to stably 
recognize three-dimensional motion with high precision. 
At the time of capturing an image of a three- 
dimensional object by a camera as two-dimensional 

25 information, a large number of pieces of important 

information are lost. 

To avoid these problems, an object is 
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* simultaneously sensed by a plurality of video cameras 

at several positions, corresponding points among 
the cameras are obtained to compute stereoscopic 
information from a plurality of sensed images, and 
5 three-dimensional motion is obtained using the computed 

information. 

In this method, since the stereoscopic information 
is defined based on a plurality of sensed images in 
q practice, problems posed when three-dimensional 

j£ 10 information is estimated from two-dimensional informa- 

^ tion can be solved . However, since computations of the 

corresponding points used to stereoscopically combine 
J" images from the plurality of cameras require much time, 

^ this method is not suitable for a real-time process, 

m 15 In order to obtain corresponding points, since camera 

O position information is required, the camera positions 

are limited and they must be calibrated. 

As described above, the conventional methods for 
recognizing three-dimensional motion from an image 
20 suffer various problems. 

In the conventional method, since the object to 
be recognized is captured using, e.g., a video camera, 
as an image having only two-dimensional information, 
three-dimensional motion must be recognized based on 
2 5 only the two-dimensional information, and it is hard to 

stably recognize three-dimensional motion with high 
precision. 
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Also, the object to be recognized must be prepared 
in advance as a template or a recognition dictionary, 
resulting in cumbersome operations. Also, the 
templates and recognition dictionary must be modified 
in correspondence with the object to be recognized, 
resulting in high cost. 

Furthermore, matching with a huge number of 
templates is required upon recognition, and a long 
recognition time is required. 

BRIEF SUMMARY OF THE INVENTION 
It is an object of the present invention to 
provide an image recognition method which can stably 
and quickly recognize three-dimensional motion with 
high precision without requiring any templates or 
dictionary for recognition, since a three-dimensional 
deformed image of a range image corresponding to an 
object is used, and an image recognition apparatus 
using that method. 

In order to achieve the above object, according to 
the first aspect of the present invention, an image 
recognition method is characterized by recognizing the 
presence/absence of three-dimensional motion of an 
object in a range image by comparing a deformed image 
obtained by deforming a captured range image with 
a newly captured range image. 

According to the second aspect of the present 
invention, an image recognition method is characterized 
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by recognizing the presence/absence of three- 
dimensional motion of an object in a range image by 
comparing a deformed image obtained by deforming a 
captured range image with a newly captured range image, 
and recognizing a series of motions recognized from 
each of a series of a plurality of range images . 

According to the third aspect of the present 
invention, an image recognition apparatus comprises 
image capture means for capturing a range image, image 
deformation means for deforming the range image 
captured by the image capture means, and recognition 
means for recognizing the presence/absence of three- 
dimensional motion of an object by comparing a deformed 
image obtained by the image deformation means and a new 
range image captured by the image capture means. 

According to the fourth aspect of the present 
invention, an image recognition apparatus comprises 
image capture means for capturing a range image, 
image deformation means for deforming the range image 
captured by the image capture means, first recognition 
means for recognizing the presence/absence of three- 
dimensional motion of an object by comparing a deformed 
image obtained by the image deformation means and a new 
range image captured by the image capture means, and 
second recognition means for recognizing a series of 
motions recognized from each of a series of a plurality 
of range images by the first recognition means. 
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According to the fifth aspect of the present 
invention, an image recognition apparatus comprises 
image capture means for capturing a range image, 
storage means for storing the range image captured by 
the image capture means, image deformation means for 
deforming a designated range image of the range image 
captured by the image capture means and/or the range 
image stored in the storage means, and recognition 
means for recognizing the presence/absence of three- 
dimensional motion of an object by comparing one of a 
deformed image obtained by the image deformation means 
and the range image stored in the storage means, and 
a new range image captured by the image capture means. 

According to the sixth aspect of the present 
invention, an image recognition apparatus comprises 
image capture means for capturing a range image, 
storage means for storing the range image captured by 
the image capture means, image deformation means for 
deforming a designated range image of the range image 
captured by the image capture means and/or the range 
image stored in the storage means, first recognition 
means for recognizing the presence/absence of three- 
dimensional motion of an object by comparing one of a 
deformed image obtained by the image deformation means 
and the range image stored in the storage means, and 
a new range image captured by the image capture means, 
and second recognition means for recognizing a series 



of motions recognized from each of a series of a 
plurality of range images by the first recognition 
means . 

According to the present invention, since a 
three-dimensional deformed image of a range image 
corresponding to an object is used, the presence/ 
absence of three-dimensional motion can be stably 
and quickly recognized with high precision without 
requiring any templates or dictionary for recognition. 

Additional objects and advantages of the invention 
will be set forth in the description which follows, and 
in part will be obvious from the description, or may 
be learned by practice of the invention. The objects 
and advantages of the invention may be realized and 
obtained by means of the instrumentalities and combina- 
tions particularly pointed out hereinafter. 

BRIEF DESCRIPTION OF THE SEVERAL VIEWS OF THE DRAWING 
The accompanying drawings, which are incorporated 
in and constitute a part of the specification, illust- 
rate presently preferred embodiments of the invention, 
and together with the general description given above 
and the detailed description of the preferred embodi- 
ments given below, serve to explain the principles of 
the invention. 

FIG. 1 is a schematic block diagram showing 
an example of the arrangement of an image recognition 
apparatus according to the first embodiment of the 



present invention; 

FIG. 2 is a schematic diagram showing the 
arrangement of an image capture section; 

FIG. 3 shows a matrix of a range image; 

FIG. 4 three-dimensionally- shows the range image; 

FIG. 5 is a plan view showing an example of the 
outer appearance of light-emitting and light-receiving 
sections that construct the image capture section; 

FIG. 6 shows an example of a range image; 

FIG. 7 is a flow chart showing the flow of 
a rotation deformation process of a range image; 

FIGS. 8A and 8B are views for explaining 
segmentation in units of voxels; 

FIGS. 9A through 9D are views for explaining 
the segmentation method in units of voxels; 

FIGS. 10A and 10B are views for explaining a 
rotation deformation process in units of voxels, and 
showing the voxel positions before and after rotation 
deformation; 

FIGS. 11A and 11B are views for explaining the 
way a range image is reconstructed after rotation 
computation; 

FIGS. 12A and 12B show a sample image of a hand 
and its deformed image; 

FIGS. 13A and 13B show a deformed image of the 
hand and its latest image; 

FIG. 14 is a flow chart showing the flow of 



a template matching process; 

FIG, 15 shows a sample image of a face; 

FIG, 16 shows a deformed image which is generated 
from the sample image shown in FIG. 15, and the face of 
which is turned slightly upward by rotating the sample 
image by (0x, 9y , 9z ) = ( 2 , 0 , 0 ) about the barycentric 
position of the head as the center; 

FIG. 17 shows a deformed image which is generated 
from the sample image shown in FIG. 15, and the face 
of which is turned slightly downward by rotating the 
sample image by (9x, 9y, 9z) = (-2, 0, 0) about the 
barycentric position of the head as the center; 

FIG. 18 shows a deformed image which is generated 
from the sample image shown in FIG. 15, and the face 
of which is turned slightly rightward on the plane of 
paper by rotating the sample image by (9x, 9y, 0z) = 
(0, 2, 0) about the barycentric position of the head as 
the center; 

FIG. 19 shows a deformed image which is generated 
from the sample image shown in FIG. 15, and the face 
of which is turned slightly leftward on the plane of 
paper by rotating the sample image by (9x, 9y, 9z) = 
(0, -2, 0) about the barycentric position of the head 
as the center; 

FIG. 2 0 shows the latest image of the face to be 
compared with the deformed images shown in FIGS. 16 
through 19 ; 



FIG. 21 is a flow chart showing the flow of 
a template matching process; 

FIG. 22 shows a sample image of a hand; 

FIG. 2 3 shows a deformed image which is generated 
from the sample image shown in FIG. 22 , and is turned 
slightly upward by rotating the sample image about the 
barycentric position of the hand; 

FIG. 24 shows a deformed image which is generated 
from the sample image shown in FIG. 22, and is turned 
slightly downward by rotating the sample image about 
the barycentric position of the hand; 

FIG. 25 shows a deformed image which is generated 
^rom the sample image shown in FIG. 22, and is turned 
•^Lightly rightward on the plane of paper by rotating 
t]Jp sample image about the barycentric position of the 
haEd; 

V^FIG. 26 shows a deformed image which is generated 
from'^the sample image shown in FIG. 22, and is turned 
slighf^y leftward on the plane of paper by rotating 
the sample image about the barycentric position of the 
hand; 

FIGS. 21 A and 2 7B show two deformed images which 
are generated from the sample image shown in FIG. 22, 
and which are turned slightly upward by rotating the 
sample image through different angles about the 
barycentric position of the hand; 

FIGS. 28A and 28B show two deformed images which 
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are generated from the sample image shown in FIG. 22, 
and which are turned slightly downward by rotating the 
sample image about the barycentric position of the 
hand; 

5 FIGS, 2 9A and 2 9B show two deformed images which 

are generated from the sample image shown in FIG. 22, 
and which are turned slightly rightward on the plane of 
paper by rotating the sample image through different 
angles about the barycentric position of the hand; 

10 FIGS. 3 OA and 3 OB show two deformed images which 

are generated from the sample image shown in FIG. 22, 
and which are turned slightly leftward on the plane of 
paper by rotating the sample image through different 
angles about the barycentric position of the hand; 

15 FIG. 31 is a schematic block diagram showing 

an example of the arrangement of an image recognition 
apparatus according to the second embodiment of the 
present invention ; 

FIG. 32 is a schematic block diagram showing 

2 0 an example of the arrangement of an image recognition 

apparatus according to the third embodiment of the 
present invention ; 

FIG. 33 is a schematic block diagram showing 
an example of the arrangement of an image recognition 

2 5 apparatus according to the fourth embodiment of the 

present invention ; 

FIG. 34 is a schematic block diagram showing 
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an example of the arrangement of an image recognition 
apparatus according to the first modification of the 
fourth embodiment of the present invention; 

FIG. 35 is a schematic block diagram showing 
5 an example of the arrangement of an image recognition 

apparatus according to the second modification of the 
fourth embodiment of the present invention; 

FIG, 36 is a schematic block diagram showing 
an example of the arrangement of an image recognition 
10 apparatus according to the fifth embodiment of the 

present invention; and 

FIG. 37 is a schematic block diagram showing 
an example of the arrangement of an image recognition 
apparatus according to the sixth embodiment of the 
15 present invention. 

DETAILED DESCRIPTION OF THE INVENTION 
Preferred embodiments of the present invention 
will be described hereinafter with reference to the 
accompanying drawings . 
2 0 (First Embodiment) 

The first embodiment of the present invention will 
be explained first. 

FIG. 1 is a block diagram showing the overall 
arrangement of an image recognition apparatus according 
25 to the first embodiment of the present invention. 

The image recognition apparatus of this embodiment 
is constructed by an image capture section 1 comprising 
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image sensing means for capturing a range image stream, 
an image deformation section 2 for performing a three- 
dimensional rotation deformation process of an 
arbitrary range image captured by the image capture 
5 section 1, and an image comparison section 3 for 

comparing the deformed range image obtained by the 
image deformation section 2 with an arbitrary range 
image in the range image stream captured by the image 
capture section 1. 

10 The image capture section 1 and range image will 

be explained below. 

The image capture section 1 captures an object to 
be recognized (e.g., the hand, face, whole body, or the 
like of a person) as images having depth values that 

15 reflect the three-dimensional shape of the object 

(to be referred to as range images hereinafter) at 
predetermined time intervals (e.g., every 1/30 sec) 
(the unit 1 can be implemented using, e.g., an image 
capture method of Japanese Patent Publication 

20 No. 8-274949). 

Since range images are captured at predetermined 
time intervals, they are sequentially held in an 
internal or external memory or the like of the image 
capture section 1, thus obtaining a moving picture of 

25 the object based on the range images (to be referred to 

as a range image stream hereinafter). At this time, 
the range image stream is obtained as a set of a 
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plurality of frames of range images such as "latest 
range image", "range image t sec before (to be referred 
to as "one frame before" hereinafter) the latest range 
image", "range image 2t sec before (to be referred to 
5 as "two frames before" hereinafter) the latest range 

image", and the like. 

The image capture section 1 is mainly comprised of 
a light-emitting unit 101, light-receiving unit 103, 
reflected light extraction unit 102, and timing signal 
10 generation unit 104, as shown in FIG* 2. 

The light-emitting unit 101 emits light whose 
intensity varies along with time in accordance with 
timing signals generated by the timing signal 
generation unit 104. This light strikes an object in 
15 front of the light-emitting unit. 

The light-receiving unit 103 detects the amount 
of light emitted by the light-emitting unit 101 and 
reflected by the object. 

The reflected light extraction unit 102 extracts 
2 0 the spatial intensity distribution of the reflected 

light received by the light-receiving unit 103. Since 
the spatial intensity distribution of the reflected 
light can be considered as an image, it will be 
referred to as a reflected light image or range image 
25 hereinafter. 

The light-receiving unit 103 receives not only 
the light emitted by the light-emitting unit 101 and 
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reflected by the object, but also external light such 
as illumination light, sunlight, and the like at the 
same time. Hence, the reflected light extraction unit 
102 extracts only light components emitted by the 
5 light-emitting unit 101 and reflected by the object by 

computing the difference between the amount of light 
received when the light-emitting unit 101 emits light, 
and that received when the light-emitting unit 101 does 
not emit light. 

!0 The reflected light extraction unit 102 extracts 

the intensity distribution, i.e., a reflected light 
image (range image) shown in FIG. 3, from the reflected 
light received by the light-receiving unit 103* 

FIG. 3 shows an 8 x 8 pixel reflected image as 

15 a part of a 256 x 256 pixel reflected light image for 

the sake of simplicity. 

Light reflected by an object decreases at a higher 
rate with increasing distance to the object. When the 
surface of an object uniformly scatters light, the 

2 0 amount of light received per pixel of the reflected 

light image decreases in inverse proportion to a square 
of the distance to the object. 

Each pixel value of the reflected light image 
represents the amount of reflected light received by 

25 a unit light-receiving unit corresponding to that pixel. 

Reflected light is influenced by the nature of the 
object (specular reflection, scattering, absorption, 
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and so forth), the direction of the object, the 
distance to the object, and the like. When the entire 
object uniformly scatters light, the reflected light 
amount is intimately related to the distance to the 
5 object. Since a hand or the like has such nature, a 

three-dimensional image shown in FIG, 4, which reflects 
the distance to the hand, the tilt of the hand (locally 
having different distances), and the like, can be 
obtained as a reflected light image obtained when the 

10 hand is stretched out to a position in front of the 

image capture section 1 . 

FIG. 5 shows an example of the outer appearance 
of the light-emitting unit 101 and light-receiving 
unit 103 that construct the image capture section 1 

15 described in Japanese Patent Publication No. 9-299648. 

The light-receiving unit 103 comprised of a circular 
lens and an area sensor (not shown) located behind 
the lens is set at the center of the unit 1, and 
a plurality of (e.g., six) light-emitting units 101 

2 0 each consisting of an LED for emitting light such as 

infrared light or the like are set at equal angular 
spacings along the perimeter of the circular lens. 

Light emitted by each light-emitting unit 101 is 
reflected by the object, and the reflected light is 

25 focused by the lens of the light-receiving unit 103 and 

is received by the area sensor located behind the lens. 
The area sensor consists of sensors in, e.g., 
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a 256 x 256 matrix, and the intensity of reflected 
light received by each sensor in the matrix becomes 
the corresponding pixel value. An image captured in 
this manner is a range image , that is, the intensity 
5 distribution of reflected light, as shown in FIG. 3. 

FIG. 3 shows part of range image data (8x8 
pixels as a part of 256 x 256 pixels). In this example, 
each cell value (pixel value) in a matrix represents 
the intensity of the captured reflected light by 256 

10 levels. For example, a cell with a value "255" 

indicates a pixel which is closest to the image capture 
section 1, and a cell with a value "0" indicates a 
pixel which is farthest from the image capture section 
1, i.e., that reflected light does not reach the image 

15 capture section 1. 

FIG. 4 three-dimensionally depicts the entire 
range image data in the matrix shown in FIG. 3. 
This example shows the range image data of the hand of 
a person. 

20 FIG. 6 shows an example of a range image of a hand 

captured by the image capture section 1. The range 
image is a three-dimensional image having depth 
information, and is defined by, e.g., 64 pixels in the 
x-axis (horizontal) direction, 64 pixels in the y-axis 

25 (vertical) direction, and 256 gray levels in the z-axis 

(depth) direction. FIG. 6 expresses each distance 
value of a range image, i.e., tone in the z-axis 
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direction in grayscale. In this case, as a color is 
closer to black, it indicates that the distance to the 
image capture section 1 is nearer, and as a color is 
closer to white, the distance is farther. When a color 
5 is perfectly white, it indicates that there is no image 

or it is equivalent to the absence of an image due to 
too far a distance even if an image is present. The 
intensity of light reflected by an object decreases in 
inverse proportion to a square of the distance to the 

10 object. That is, a pixel value Q(i, j) of each pixel 

(i, j) in a range image is given by: 

Q(i, j) = K/d 2 
where K is a coefficient which is adjusted so that a 
value R(i, j) = "255" when d = 0,5 m. By solving the 

15 above equation for d, a distance value can be obtained. 

The image deformation section 2 will be explained 
below. 

The image deformation section 2 performs a three- 
dimensional rotation deformation process of a range 

2 0 image (to be referred to as a sample image hereinafter) 

always several frames (e.g., one frame) before the 
latest image of those contained in the range image 
stream of the object to be recognized, which has been 
captured by the image capture section 1 to generate a 

2 5 new range image (to be referred to as a deformed image 

hereinafter) . 

Note that the range image to be used as the sample 
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image (the number of frames before the latest image) is 
determined based on information such as the range image 
capture interval (frame rate) of the image capture 
section 1, the motion speed of the object , and the like, 
5 If N frames can be captured during a series of motions 

such as turning the hand about the y-axis, the sample 
image can be arbitrarily selected from range images one 
through N frames before the latest image. 

The actual three-dimensional rotation deformation 
10 process of the image deformation section 2 for the 

range image will be explained in detail below. FIG. 7 
is a flow chart for explaining the rotation deformation 
process in the image deformation section 2. 

When a distance value d(x, y) at each pixel 



15 position (x, y) = z, z stacked cubes (to be referred to 

^| as voxels hereinafter) define the point, and a range 

image shown in FIG. 8A is segmented in units of voxels, 
as shown in FIG. 8B (step SI). 

Note that the aforementioned voxel segmentation 

2 0 method is an example, and the voxel segmentation range 

of the object may be limited, as shown in FIGS. 9A 
through 9D. For example, voxels that are infinitely 
connected downward may define the point, as shown in 
FIG. 9A. Alternatively, absence of voxels below a 

25 given range may be assumed, as shown in FIG. 9B, or 

presence of voxels only in the vicinity of a surface 
may be assumed, as shown in FIGS. 9C and 9D. 



A center (xO, yO, zO) of rotation is determined 
(step S2). Note that the central position of rotation 
can be arbitrarily determined depending on the purpose. 
For example, when the face is rotated, the central axis 
of a neck can be set at the center; when the hand is 
rotated, the barycentric position of the hand can be 
set at the center. 

Furthermore, a direction (0x, 9y, 9z) of rotation 
is determined (step S3), Note that 9x is the rotational 
angle about the x-axis, 9y is that about the y-axis, and 
9z is that about the z-axis. 

In this case, each rotational angle can be 
determined based on the motion speed of the object to 
be recognized, the range image capture interval (frame 
rate) of the image capture section 1, and a range image 
selected as the sample image (the number of frames 
before the latest image). For example, when rotation 
of the hand about the y-axis at around 30°/sec is to 
be captured by an image sensing device having a frame 
rate = 1/3 0 sec using a range image one frame before 
the latest image as the sample image, since the hand is 
rotated 1° per frame, 9y = 1° can be set, 

A position (x 1 , y, z r ) after rotation is computed 
(step S4) in units of voxels (x, y, z) shown in 
FIG. 10A by: 
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Equation ( 1 ) exemplifies a computation formula 
used when each voxel (x, y, z) is rotated 9x, 9y, and 0z 
respectively about the x-, y-, and z-axes to have the 
central position (xO, yO, zO) of rotation obtained in 
step S2 in FIG. 7. 

Note that a voxel located at a coordinate position 
(x, y, z) will be referred to as voxel (x, y, z). 

The arithmetic operation in step S4 is made for 
all the voxels (step S5). FIG. 10B shows the rotation 
result of the individual voxels. 

Upon completion of the processing for all the 
voxels, a maximum z-value (Zmax) of a voxel (x, y, z) 
located at each pixel position (x, y, z) is obtained, 
as shown in FIG. 11A, and is used as a distance value 
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d(x, y) of a pixel (x, y) of the deformed pixel, thus 

reconstructing the range image (step S6). 

In the above description, the flow of voxel 

segmentation and the rotation deformation process by 
5 equation (1) is merely an example, and the present 

invention is not limited to such specific flow. A 

range image may undergo rotation deformation using 

other schemes . 

With the aforementioned processes, a range image 
10 obtained by applying an arbitrary three-dimensional 

rotation process to the sample image, i.e., the 

deformed image, can be generated. 

FIG . 12A shows the sample image, and FIG. 12B 

shows an example of the deformed image obtained by 
15 three-dimensionally rotating the sample image by the 

image deformation section 2 . This example depicts the 

deformed image obtained by performing the rotation 

deformation process of the sample image of the hand 

about the y-axis (vertical direction). 
2 0 The image comparison section 3 will be explained 

below. 

The image comparison section 3 compares the latest 
range image (to be referred to as the latest image 
hereinafter) captured by the image capture section 1 
25 with the deformed image obtained by the image 

deformation section 2 to check if these two images are 
similar to each other. 
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In this case, similarity is discriminated by 
computing correlation between the latest and deformed 
images. Template matching is one of such methods, and 
computes similarity between the image to be compared 
5 with a template image prepared in advance to check if 

the object to be compared is similar to the template 
image. In this embodiment, correlation is computed 
using this template matching. 

More specifically, the deformed image is used as 
10 a template image, and the latest image is compared with 

that deformed image using template matching, thus 
discriminating the degree of similarity between these 
two images. 

FIG. 13A shows the deformed image obtained by 
15 rotating the sample image shown in FIG. 12B, and 

FIG. 13B shows an example of the latest image. The 
degree of similarity between these deformed and latest 
images is discriminated. 

FIG. 14 is a flow chart for explaining the flow of 
20 the processing using template matching in the image 

comparison section 3. The flow of the processing will 
be explained below with the aid of FIG. 14. 

The positions of the latest image and template 
image are normalized if necessary (step Sll). The 
25 positions can be normalized by matching the barycentric 

positions of the latest and template images with each 
other. For this purpose, if (xc, yc, zc) represents 
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the barycentric position of the latest image, and 
(xt, yt, zt) represents that of the template image, the 
template image can be translated by distances xc - xt, 
yc - yt, and zc - zt respectively in the x-, y-, and 
5 z-directions . 

Note that one position normalization scheme has 
been explained, but the present invention is not 
limited to such specific position normalization method. 
For example, both the latest and template images may be 

10 translated to locate their barycenters at a specific 

position, or their positions may be normalized using 
keys other than the barycentric positions. 

The Hamming distance between the latent and 
template images is then computed (step S12), 

15 The Hamming distance (H) is computed by: 

H-ZZl^i)-^)! .-.(2) 



J 



where i and j are the x- and y-coordinates of each 

pixel, d(i, j) is the distance value at the coordinate 
2 0 position (i, j) of the latest image, and t(i, j) is the 

distance value at the coordinate position (i, j) of the 

template image. 

Note that one method of computing the Hamming 

distance has been explained. However, the present 
25 invention is not limited to such specific Hamming 

distance computation method, but may use other 

computation formulas , 

It is then checked if the Hamming distance (H) 
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value is smaller than a predetermined threshold value 
(Th) (i.e., H < Th). If the computed Hamming distance 
is smaller than the threshold value, it is determined 
that the latest image is similar to the template image 
5 (steps S13 and S14). 

With the aforementioned processes, it can be 
checked if the latest image is similar to the deformed 
image . 

Note that this embodiment has exemplified the 

10 method of computing similarity between the latest and 

template images by obtaining the Hamming distance 
therebetween. However, the present invention is not 
limited to this method, but may use other computation 
methods such as a method using a computation of 

15 distance having a different definition from the Hamming 

distance, and the like. 

Also, this embodiment has exemplified the method 
of computing correlation using template matching. 
However, the present invention is not limited to such 

20 specific correlation computation method, but. may use 

various other possible choices such as a method using 
DP matching, KL transformation, or the like, a method 
of computing the Fourier-transforms of the two images, 
and analyzing correlation between the images after 

25 Fourier transformation, and the like. 

As described above, according to this embodiment, 
whether or not an image obtained by rotating an image 
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several frames (e.g., one frame) before the latest 
image is similar to the latest image can be detected. 
That is, whether or not the object has rotated between 
a timing several frames before the current timing, and 
5 the current timing can be recognized. 

Furthermore, in this embodiment, the range is 
actually three-dimensionally rotated, and recognition 
is done using that three-dimensional information, 
unlike the conventional recognition method for 
10 estimating three-dimensional rotation from two- 

dimensional information in a two-dimensional image 
(e.g., rotation of the hand about the y-axis is 
estimated because the projection area of the hand in 
H the x-axis direction (horizontal direction) decreases). 

ill 

Rl 15 For this reason, recognition can be done more reliably 

Jj and stably than the conventional method. 

In conventional recognition using template 
matching, a large number of template images must be 
prepared in advance. However, in the method of this 
20 embodiment, since the deformed image is generated in 

real time, and is used as a template image, no template 
image need be prepared in advance, and memory resources 
or the like can be prevented from being wasted, thus 
allowing efficient processing. 
2 5 Also, in conventional recognition using template 

matching, since it is impossible to prepare every kinds 
of template images in practice, some template images 
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having representative shapes of objects to be 
recognized are normally prepared. For example, in case 
of face recognition, faces of male and female adults, 
children, aged persons, and the like for several 
5 persons are prepared. At this time, since a represen- 

tative shape is used as a template image to be compared, 
it is often different from the current object to be 
recognized in details, and such difference is one 
factor that lowers similarity. When recognition is 

10 done for unspecified objects, the recognition rate 

cannot be improved unless a largest possible number of 
template images must be prepared. However, since the 
method of this embodiment generates a template image by 
deforming the object to be recognized itself, such 

15 problem can be solved. 

To restate, according to this embodiment, since a 
range image several frames before the latest image of 
the object to be recognized undergoes three-dimensional 
rotation deformation in real time, and the image 

20 obtained by deformation is compared with the current 

range image, three-dimensional rotation of the object 
can be stably recognized in real time. 

This embodiment is not limited to the aforemen- 
tioned arrangement, and various modifications of this 

25 embodiment may be made. Some modifications of this 

embodiment will be explained below. 
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(First Modification of First Embodiment) 

The image capture section 1 may capture range 
images at specific timings (e.g., a user instruction or 
the like) in place of predetermined time intervals. 

in this manner, three-dimensional rotation at 
arbitrary time intervals can be recognized. For 
example, the user instructs the start and end timings 
using a switch to detect whether or not rotation has 
taken place during that interval. 

For example, vehicles must be equipped with 
airbags to relax collision shocks upon accident. 
Upon inflating an airbag, the direction and position of 
the face of a passenger at the front passenger seat 
must be detected to prevent the passenger from being 
excessively pressed by the airbag. At this time, when 
a person sits at a seat and fastens a seatbelt, a range 
image of the face of the passenger at the front 
passenger seat is captured, and a deformed image is 
obtained using the captured image as a sample image. 
Then, a range image of the face immediately before 
inflation of the airbag is captured, and is compared 
with the deformed image, thus recognizing the position 
and direction of the face of the passenger with high 
precision. 

(Second Modification of First Embodiment) 

The image deformation section 2 may generate 
a deformed image by the three-dimensional rotation 
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deformation process of a range image several frames 
(e.g., one frame) before a range image of a specific 
past frame in place of the latest image , and the image 
comparison section 3 may compare the range image of 
5 the frame used as the reference image in the image 

deformation section 2 with the deformed image obtained 
by the image deformation section 2 to check if these 
two images are similar to each other. 

In this manner, three-dimensional rotation at 
10 a specific past timing can be recognized. 

(Third Modification of First Embodiment) 

The image deformation section 2 and image 
comparison section 3 may be modified as follows. 

A new image deformation section 2 performs a 
15 plurality of three-dimensional rotation deformation 

processes having different deformation parameters for 
the sample image to generate a plurality of deformed 
images . 

A new image comparison section 3 compares the 
2 0 latest image captured by the image capture section 1 

with the plurality of deformed images with different 
deformation parameters obtained by the new image 
deformation section 2 to check if the deformed images 
include those similar to the latest image. If such 
25 images are found, one deformed image having highest 

similarity with the latest image is detected. 

An example of the processes in the new image 
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deformation section 2 of this modification will be 
explained below. 

Assume that the range image of the face of a 
person shown in, e.g., FIG. 15 is captured as a sample 
5 image . 

Four deformed images are generated by rotating 
this sample image of the face through an identical 
angle respectively in the up, down, right, and left 
directions, as shown in FIGS, 16 through 19. FIG. 16 

10 shows a deformed image obtained by rotating the sample 

image through (9x, 9y, 9z) = (2, 0, 0) to have the 
barycentric position of the head as the center to turn 
the face slightly upward. Also, FIG. 17 shows a 
deformed image obtained by rotating the sample image 

15 through (9x, 9y, 0z) = (-2, 0, 0) to have the 

barycentric position of the head as the center to turn 
the face slightly downward. Likewise, FIG. 18 shows 
a deformed image obtained by rotating the sample image 
through (0x, 9y, 9z) = (0, 2, 0) to have the barycentric 

2 0 position of the head as the center to turn the face 

slightly rightward on the plane of paper. Furthermore, 
FIG. 19 shows a deformed image obtained by rotating the 
sample image through (9x, 9y, 9z) = (0, -2, 0) to have 
the barycentric position of the head as the center to 

25 turn the face slightly leftward on the plane of paper. 

Note that the unit of angle is "degree". 

In FIGS. 16 through 19, since the sample image is 
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rotated in the respective directions, the color of a 
portion deformed in a direction to approach the image 
capture section 1 becomes closer to black, and that of 
a portion deformed in a direction to be farther from 
5 the unit 1 becomes closer to white ♦ 

Using all the four deformed images shown in 
FIGS. 16 through 19 obtained by the image deformation 
section 2, the image comparison section 3 makes 
template matching with the latest image shown in 
10 FIG. 2 0 to check if the four deformed images include 

ones similar to the latest image, and to detect the 
deformed image with highest similarity if such images 
are found. 

FIG. 21 is a flow chart for explaining the flow of 
15 the processing using template matching in the new image 

comparison section 3. The flow of the processing will 
be explained below with reference to FIG. 21. 

One template image (Ti) is selected (step S21). 
That is, Ti is one of the four deformed images shown in 
20 FIGS. 16 through 19. 

The positions of the selected image and latest 
image are normalized if necessary (step S22) as in the 
description of FIG. 14. 

The Hamming distance between the latest image and 
2 5 template image (Ti) is computed (step S23) as in the 

description of FIG. 14. 

It is checked if the Hamming distances have been 
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computed for all the template images (step S24). 
If the Hamming distances to be computed still remain, 
the flow returns to step S21. Upon completion of 
computations for all the template images, a template 
5 image (Tmin) that yields the minimum Hamming distance 

(Hmin) is selected (step S25). In this case, assume 
that the deformed image shown in, e.g., FIG. 16 is 
selected. 

It is checked if this Hamming distance (Hmin) 
10 value is smaller than a predetermined threshold value 

(Th) (i.e., Hmin < Th) (step S26). If the Hamming 
distance is smaller than the threshold value, it is 
determined that the latest image is similar to the 
template image (Ti) (step S27). if this condition is 
15 satisfied, it is determined that the deformed image 

shown in FIG. 16 is similar to the latest image. 

With the aforementioned processes, a deformed 
image having highest similarity to the latest image can 
be detected. (A result that indicates the absence of 
2 0 any similar deformed image may be obtained (step S2 8).) 

In this manner, since it is determined that the 
latest image shown in FIG. 2 0 is similar to the 
deformed image that turned the face rightward shown in 
FIG. 16, it is recognized that "the person to be 
25 recognized has turned his or her face rightward". 

As described above, in this modification, the 
direction of rotation (in which direction the object 
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> has rotated) of the object to be recognized can be 

recognized. 

Note that this modification has exemplified the 
method of generating deformed images in the four, up, 
5 down, right, and left directions* However, this method 

is an example of a plurality of three-dimensional 
rotation deformation processes with different 
deformation parameters, and the present invention is 
O not limited to this method, but can freely select 

=p !0 directions to be rotated in correspondence with the 

purposes of recognition. For example, the number of 
•h directions to be rotated may be increased to eight, 

s"" i.e., up, down, right, left, upper right, upper left, 

^7 lower right, and lower left directions, the number of 

15 directions only on, e.g., the right side may be 

^ increased to mainly check that side, identical 

directions to be rotated having different rotational 
angles may be prepared, and so forth. 

For example, rotation of the hand will be examined. 
2 0 In place of generating deformed images shown in 

FIGS. 2 3 through 26 by rotating a sample image shown in 
FIG. 22 in the four, i.e., up, down, right, and left 
directions to have the barycentric position of the hand 
as the center, a plurality of deformed images having 
25 different rotational angles (two angles, i.e., 1° and 2° 

in this example) in each of the up, down, right, and 
left directions may be generated, as shown in FIGS. 27A 
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through 3 OB, 

FIG. 21 A shows a deformed image generated by- 
rotating the sample image through (9x, 9y, 9z) = (1, 0, 
0) to turn the hand slightly upward, and FIG, 27B shows 
5 a deformed image generated by rotating the sample image 

through another degree, i.e., (9x, 9y, 9z) = (2, 0, 0) 
to turn the hand further upward. FIG. 2 8A shows a 
deformed image generated by rotating the sample image 
through (9x, 9y, 0z) = (-1, 0, 0) to turn the hand 

10 slightly downward, and FIG. 28B shows a deformed image 

generated by rotating the sample image through another 
degree, i.e., (9x, 9y, 0z) = (-2, 0, 0) to turn the hand 
further downward. FIG. 2 9A shows a deformed image 
generated by rotating the sample image through (9x, 9y, 

!5 6z) = (0, 1, 0) to turn the hand slightly rightward on 

the plane of paper, and FIG. 2 9B shows a deformed image 
generated by rotating the sample image through another 
degree, i.e., (9x, 9y, 9z) = (0, 2, 0) to turn the hand 
further rightward. FIG. 3 OA shows a deformed image 

2 0 generated by rotating the sample image through (9x, 0y, 

9z) = (0, -1, 0) to turn the hand slightly leftward on 
the plane of paper, and FIG. 3 0B shows a deformed image 
generated by rotating the sample image through another 
degree, i.e., (9x, 9y, 9z) = (0, -2, 0) to turn the hand 

25 further leftward. 

In this manner, since a plurality of deformed 
images having different rotational angles (two angles, 
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i.e., 1° and 2° in this example) in each of the up, down, 
right, and left directions are prepared, not only the 
direction of rotation of the object to be recognized 
but also the rotation amount can be recognized, 
5 At this time, if the range image capture frame 

rate remains the same, since the rotation amount is 
proportional to the motion speed, both the motion 
direction and speed can be recognized at the same time. 
That is, not only the motion of the object, i.e., 
10 which side a person has turned his or her hand, is 

recognized but also that motion speed can be obtained 
at the same time. 

(Fourth Modification of First Embodiment) 

The image deformation section 2 of the first 

15 embodiment generates a deformed image by a three- 

dimensional rotation deformation process. Also, a 
range image that has not undergone any deformation, 
i.e., a sample image itself may be directly used as 
the deformed image. 

2 0 in this case, whether or not the object to be 

recognized stands still can be recognized. 
(Fifth Modification of First Embodiment) 

The image deformation section 2 of the first 
embodiment generates a deformed image by the three- 

25 dimensional rotation deformation process, but may 

generate a deformed image by a translation deformation 
process . 



In this case, three-dimensional translation of the 
object to be recognized can be recognized. 

The translation deformation process has been 
exemplified as one scheme of deformation means in the 
image deformation section 2. However, the present 
invention is not limited to such specific deformation 
means, but can use various other deformation means 
such as enlargement/reduction, reversal (mirroring), 
trimming, and the like. 

In this manner, motions that are associated with 
arbitrary deformations such as three-dimensional 
enlargement/reduction and the like of the object to be 
recognized can be recognized. 

Furthermore, when the image deformation section 2 
performs a combination of these deformation processes, 
not only single motion such as rotation, translation, 
or the like but also arbitrary motions can be 
recognized. 

For example, when a deformed image is generated by 
combining "translation deformation of the hand image in 
the z-axis (depth) direction" and "rotation about the 
y-axis (vertical direction)", a combination of back- 
and-forth motion in the depth direction and right-and- 
left rotation about the vertical axis of the hand can 
be recognized. 
(Second Embodiment) 

The second embodiment of the present invention 
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* will be described below. 

FIG. 31 is a block diagram showing the overall 
arrangement of an image recognition apparatus according 
to the second embodiment of the present invention. 
5 The image recognition apparatus of this embodiment 

comprises a motion recognition section 4 for implement- 
ing motion recognition using the comparison result 
in the image comparison section 3 in addition to the 
Q arrangement of the image recognition apparatus of the 

Jj 10 first embodiment. 

The motion recognition section 4 will be explained 
below. 

Using the image recognition apparatus according to 
S| the first embodiment, whether or not rotation has taken 

15 place between a timing of the latest frame and a timing 

several frames before can be discriminated. Since the 
image capture section 1 in the first embodiment sequen- 
tially captures range images at predetermined intervals, 
one of two choices "rotated" and "not rotated" is 
2 0 obtained in turn as a recognition result by repeating 

discrimination of rotation every time the latest frame 
is obtained. 

The motion recognition section 4 recognizes motion, 
i.e., what meaning the detected rotation has, using a 
25 sequence of discrimination results indicating whether 

or not rotation has taken place, which are sequentially 
obtained by the image comparison section 3. 
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Since the image comparison section 3 sequentially 
obtains discrimination results each indicating whether 
or not rotation has taken place at the latest frame , 
the number of times motion has occurred within an 
5 arbitrary time interval can be detected by counting 

a total number of rotations that occurred within that 
time interval* 

With this technology, even a person, who cannot 
O talk owing to some disease, accident, or the like and 

,5 10 can only move hands, can explicitly reveal his or her 

N 

will; e.g., "Yes" when he or she turns the hand once, 

=?* 

"No" when twice, "Want to do something" when three 
J times, and so on. Conventionally, when a patient in 

»| a sick room has some abnormal situation, he or she 

H 15 calls a nurse or doctor by a button type buzzer at his 

^ or her bedside and talks to the nurse or doctor via 

an interphone to give the information needed. However, 
when a patient cannot talk, mutual understanding can 
hardly be achieved until the nurse or doctor reaches 
20 the sick room* In such case, when the image recogni- 

tion apparatus of the present invention is used in 
place of the buzzer or interphone, mutual understanding 
can be achieved even when the doctor or nurse is not 
present at that place. 
25 Furthermore, the image recognition apparatus of 

this embodiment can obtain a pattern of motions 
indicating that, e.g., rotation was "done, done, not 



done, done, not done, . . . 11 within a specific period of 
time. 

In this manner, when the image recognition 
apparatus of this embodiment is connected to, e.g., 
a personal computer (PC), if actions to be taken 
in response to predetermined motion patterns are 
determined in advance, the PC can be operated by user's 
hand actions. For example, when a motion pattern 
indicating that rotation was "not done, not done, done 11 
is obtained, wordprocessing software is launched; when 
a motion pattern indicating that rotation was "not 
done, done, done" is obtained, spreadsheet software is 
launched, and so forth. Likewise, hand actions can 
operate various devices. For example, home electronic 
apparatuses such as a TV, video player, and the like 
can be operated by hand actions; the power switch is 
turned on upon detecting a given pattern, the tone 
volume is increased/decreased upon detecting another 
pattern, and so on. Also, a game machine can be 
operated by hand actions to change character motions 
depending on patterns. 

For example, when the image comparison section 3 
counts the number of rotations of the hand within an 
arbitrary time interval, the motion recognition section 
4 has a table that stores the meanings (types) of 
motions such as "Yes" when the user turns the hand once, 
"No" when twice, "Want to do something" when three 
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times, and so on. The motion recognition section 4 
looks up this table to recognize and output the type of 
motion corresponding to the number of motions (e.g., 
hand rotations) obtained from the recognition result in 
5 the image comparison section 3. 

For example, when the image comparison section 3 
recognizes whether rotation is "done" or "not done", 
the motion recognition section 4 has a table for 
pre-storing types of actions such as "to launch 

10 wordprocessing software" in response to a pattern of 

a series of motions indicating that rotation was "not 
done, not done, done", and "to launch spreadsheet 
software" in response to a pattern of a series of 
motions indicating that rotation was "not done, done, 

15 done". The motion recognition section 4 looks up 

this table to recognize and output the type of action 
corresponding to a predetermined pattern of a series of 
motions obtained from the recognition result of the 
image comparison section 3. 

2 0 (First Modification of Second Embodiment) 

When the motion recognition section 4 is added to 
the image recognition apparatus described in the third 
modification of the first embodiment, the motion 
recognition section 4 can detect a series of motions. 

25 For example, when images of the face are sensed, 

and a person makes a series of motions such as "turns 
the face rightward", "does not turn the face (stands 
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still)", "turns the face leftward", "does not turn 
the face", " turns the face rightward", . .., it can 
be recognized that the person is shaking the head 
horizontally ( saying "No" ) . 
5 (Second Modification of Second Embodiment) 

When the motion recognition section 4 is added 
to the fifth modification of the first embodiment, a 
series of motions such as "translation to the right", 
Q "rotation about the vertical axis", "movement in the 

S 10 depth direction", and the like of, e.g., the hand can 

U be recognized. 

yg In this manner, unique motions that only a given 

3 * person knows are registered in a personal authentica- 

kj tion apparatus such as an auto-locking apparatus of 

[;! 15 a door, an ATM apparatus in a bank, or the like, and 

'^j whether or not the person is authentic can be detected 

by checking if motions of the person in the authentica- 
tion process of that apparatus matches the registered 
ones. In this manner, the image recognition apparatus 
2 0 of this embodiment can be used in personal 

authentication . 

Furthermore, since individual deformation 
parameters upon deformation in the image deformation 
section 2 are known in advance, arbitrary motion can 
25 be formulated into equations by holding all these 

parameters . 

Human motions are ambiguous unlike those of 
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* machines , and it is conventionally difficult to 

describe them by formulas using parameters such as the 
center of rotation, rotational angle, and the like. 
However, according to this modification, human motions 
5 can be clearly formulated into equations. 

(Third Embodiment) 

The third embodiment of the present invention will 
be explained below. 
O FIG. 32 is a block diagram showing the overall 

10 arrangement of an image recognition apparatus according 

\Jt to the third embodiment of the present invention. 

The image recognition apparatus of this embodiment 
B "' comprises an image holding section 6 for holding range 

»j images or range image streams (a plurality of time- 

r{ 15 serially continuous range images captured at given time 

intervals), and an image designation section 5 for 
extracting an arbitrary range image or range image 
stream from those held in the image holding section 5, 
in addition to the arrangement of the image recognition 
2 0 apparatus of the first embodiment. 

The image holding section 6 will be explained 
first . 

The image holding section 6 holds range images or 
range image streams captured inside or outside the 
2 5 image recognition apparatus of this embodiment. As a 

holding method, for example, a range image or range 
image stream to be held is stored in a hard disk, 
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silicon disk, memory, or the like as one or a plurality 
of files. 

Note that the aforementioned holding method is 
merely an example, and the present invention is not 
5 limited to such specific method. Arbitrary holding 

methods can be used. Also, two or more holding methods 
can be used at the same time. 

The holding location is not always one. Files may 
be divisionally held in a plurality of hard disks or 
10 may be distributed and held in hard disks in PCs, which 

are located at physically different places (e.g., Tokyo 
and NewYork), via a network. 

Furthermore, the image holding section 6 can hold 
arbitrary range images or range image streams generated 
15 by an external apparatus, range images or range image 

streams captured by the image capture section 1, range 
images deformed by the image deformation section 2, and 
the like at arbitrary timings. 

The image designation section 5 will be described 
2 0 below. 

The image designation section 5 extracts an 
arbitrary range image or range image stream held in 
the image holding section 6, and passes it to the image 
deformation section 2. 
25 The image designation section 5 can also extract 

only some range images in a range image stream. 
For example, when a range image stream consists of 10 
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frames, the image designation section 5 can extract 
only five frames (e.g., the third to seventh frames), 
and can pass them as a range image stream consisting of 
five frames. Also, the image designation section 5 can 
5 pass an arbitrary range image in a range image stream. 

In this case, the image deformation section 2 
generates a deformed image using a range image 
extracted by the image designation section 5 as a 
sample image in place of that captured by the image 

10 capture section 1. 

In this manner, according to this embodiment, 
recognition can be made using deformed images of pre- 
stored range images as template images unlike in the 
first embodiment. 

15 That is, when range images to be used in 

recognition of a given motion are registered in advance 
in the image holding section 6, whether or not that 
motion has taken place can be recognized. 
(First Modification of Third Embodiment) 

2 0 The third embodiment may further comprise the 

motion recognition section 4 that has been explained in 
the second embodiment. 

In this case, an image recognition apparatus which 
can obtain the effects of the second embodiment in 

25 addition to those of this embodiment can be constructed. 

(Second Modification of the Third Embodiment) 

In this embodiment, a range image or range image 
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stream extracted by the image designation section 5 is 
input to the image deformation section 2 to generate 
a deformed image, and the generated deformed image is 
used as an image to be compared (template image) in the 
5 image comparison section 3. Alternatively, a range 

image or range image stream extracted by the image 
designation section 5 may be directly input to the 
image comparison section 3 to be used as an image to be 
compared. 

10 In this manner, a range image stream indicating 

a motion sequence to be recognized is registered in 
advance in the image holding section 6, is extracted by 
the image designation section 5, and can be compared 
with a range image stream captured by the image capture 

15 section 1. 

That is, according to this modification, whether 
or not registered motion has taken place can be 
recognized. 

When range images that have undergone a 

2 0 deformation process equivalent to that in the image 

deformation section 2 are held in the image holding 
section 5, comparison can be made without requiring any 
computation cost for the deformation process in the 
image deformation section 2. in this manner, real-time 

25 performance can be further improved. 

(Fourth Embodiment) 

The fourth embodiment of the present invention 
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will be described below. 

FIG. 33 is a block diagram showing the overall 
arrangement of an image recognition apparatus according 
to the fourth embodiment of the present invention. 
5 The image recognition apparatus of this embodiment 

comprises a motion prediction section 7 that predicts 
future motion in addition to the arrangement of the 
image recognition apparatus of the second embodiment. 

The motion prediction section 7 will be explained 
10 first. 

The motion prediction section 7 predicts future 
motion using the result of the motion recognition 
section 4. 

For example, when an object successively "rotates 
15 about the vertical axis" three times, it is predicted 

that the object will "rotate about the vertical axis" 
or will "quit its motion" in the next frame (a frame 
one frame after the current frame). 

In this case, as keys for prediction, background 
2 0 knowledge such as a structural nature or the like of 

the object may be taken into consideration in addition 
to the result in the aforementioned motion recognition 
section 4. For example, the fingers of a person have 
a limited motion range due to their structures. Such 
25 knowledge may be considered as a key for prediction. 

The motion prediction section 7 may have a table 
that stores expected motions in response to motion 



(e.g., three successive rotations about the vertical 
axis) recognized by the motion recognition section 4, 
and may make motion prediction with reference to this 
table. 

In this manner, the next motion of the object can 
be predicted. 

(First Modification of Fourth Embodiment) 

The image recognition apparatus of the fourth 
embodiment comprises the motion prediction section 7 
for predicting motion using the recognition result in 
the motion recognition section 4. In place of that 
motion prediction section 7 , the apparatus may comprise 
a feature amount extraction section 8 for extracting a 
feature amount from a range image or range image stream 
captured by the image capture section 1 or the like, 
and a motion prediction section 7 for predicting motion 
using information of the feature amount from the 
feature amount extraction section 8, as shown in 
FIG. 34. 

In this case, the feature amount extraction 
section 8 extracts the feature amount of an object; for 
example, the barycentric position of the object from 
a range image, the barycentric speed of the object from 
a range image stream, and the like. 

A case will be explained below a barycentric 
position G of an object is computed from a range image. 
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Case 1: When range image is handled intact: 
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y=0 Vx=0 



m-1 n-l 
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... (3) 

Case 2 : When range image is handled as shown in 
FIG • 9B: 
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10 where Fmin is a minimum value (kilo) of F(x, y). 

Case 3: When only surface of range image is 
handled, as shown in FIG. 9D: 



••(4) 
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...(5) 

5 in equations (3) to (5), F(x, y) is the pixel 

value of a pixel (x, y), and C(x, y) is a function 
defined as: 

•when F(x, y) * 0, C(x, y) = 1 
•Otherwise, C(x, y) = 0 
!0 Also, m and n are the x- and y-sizes (the numbers 

of pixels) of a frame. 

The speed of the barycentric position can be 
easily computed from the moving amount of the 
barycentric positions of the objects extracted from 
15 continuously captured range images, and that time 

interval. 

Note that the barycentric position and speed have 
been exemplified as feature amounts. However, the 
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present invention is not limited to such specific 
feature amounts, and various other feature amounts such 
as features of the area, volume, shape, and the like 
of an object can be used. Since these feature amounts 
5 can be easily obtained from edge information, depth 

information, and the like of an object extracted from 
a range image using a conventional scheme, a detailed 
description thereof will be omitted. 

The motion prediction section 7 then predicts the 

10 next motion using feature amounts such as changes in 

barycentric position, barycentric speed, and the like 
of the object obtained by the feature amount extraction 
section 8 as keys. 

In this manner as well, the next motion of the 

15 object can be predicted as in the fourth embodiment. 

Furthermore, the motion prediction section 7 
may simultaneously use both the result of the motion 
recognition section 4 that has been explained in the 
fourth embodiment, and the feature amount extracted 

2 0 by the feature amount extraction section 8 so as to 

predict the next motion. 

In this manner, more stable and reliable 
prediction can be implemented since more kinds of 
information can be used as keys for prediction. 

25 For example, when an airbag is to be inflated upon 

vehicle accident, if a person is present within a very 
close range, the airbag must be inhibited from being 
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inflated. However, a conventional distance sensor 
using an ultrasonic wave, infrared light, or the like 
can detect whether or not an object is present within 
a given range, but cannot discriminate whether the 
5 object is a fly, a ball thrown by a child in the rear 

passenger seat, or a person* According to the present 
invention, a person or other objects can be distin- 
guished with high precision on the basis of the feature 
amount such as a volume or the like computed from a 
10 range image by the feature amount extraction section 8, 

and the motion of a person predicted by the motion 
prediction section 7. 

(Second Modification of the Fourth Embodiment) 

As shown in FIG. 35, the result of the motion 
15 prediction section 7 may be used as a key for computing 

a deformation parameter in the image deformation 
section 2. 

For example, the third modification of the first 
embodiment has explained the method of deforming a 

20 range image of an object in the four, i.e., up, down, 

right, and left directions. When motion is predicted 
using the motion prediction section 7, a possible 
deformation method can be determined. For example, 
when it is predicted that the object is unlikely to 

25 rotate in the right direction, it can be determined 

that a range image need only be deformed in only the 
three, i.e., up, down, and left directions. In this 
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manner, the number of deformations can be limited. 

In this manner, an unwanted deformation process 

can be omitted, and extra computation cost can be 

reduced, thus further improving real-time performance 
5 of recognition. 

(Third Modification of Fourth Embodiment) 

The arrangement shown in FIG. 34 or 35 may further 

comprise the image holding section 6 for holding range 

images captured by the image capture section 1 and 
10 deformed images generated by the image deformation 

section 2, as has been explained in the third 

embodiment . 

In this case, an image recognition apparatus which 
can obtain the effects of the third embodiment in 
15 addition to those of this embodiment can be constructed. 

(Fifth Embodiment) 

The fifth embodiment of the present invention will 
be described below. 

FIG. 3 6 shows an example of the arrangement of 
2 0 an image recognition apparatus according to the fifth 

embodiment of the present invention. 

The image recognition apparatus of this embodiment 
comprises an image compression section 9 for compress- 
ing an image on the basis of the recognition result of 
25 the image recognition apparatus in addition to the 

arrangement of the image recognition apparatus of the 
first, second, third, or fourth embodiment. 
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With this arrangement, a range image can be 
compressed using the result obtained by the first, 
second, third, or fourth embodiment. 

In FIG* 36, a range image is compressed on the 
5 basis of various parameters used in the image deforma- 

tion section 2 using the recognition result in the 
motion recognition section 4. However, the present 
invention is not limited to such specific arrangement. 
Q For example, a range image may be compressed on 

10 the basis of various parameters used in the image 

M deformation section 2 using the recognition result in 

gh the image comparison section 3. That is, the present 

invention is not particularly limited as long as 
j*j a range image is compressed on the basis of various 

15 parameters and the like used in the image deformation 

section using the recognition result. 

The image compression section 9 will be explained 
below. 

The image compression section 9 compresses data 
2 0 of a range image or range image stream captured by the 

image capture section 1 or the like on the basis of the 
recognition result. 

In this case, the image compression method can be 
either reversible or irreversible compression depending 
2 5 on purposes. 

More specifically, compression is done by the 
following method. 
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For example, when the image compression section 9 
is added to the image recognition apparatus with the 
arrangement shown in FIG. 31, that has been explained 
in the second modification of the second embodiment, 
5 arbitrary motions of an object in the motion 

recognition section 4 can be equated. 

Hence, for example, actual range image data 
are held at 5-frame intervals, and only deformation 
parameter values used upon deforming a range image in 
Jg 10 the image deformation section 2 are held in place of 

M actual range image data for four frames between the 

kh held range image data, thus forming compressed images. 

Since the actual range image data requires 8 bits 
Si per pixel in case when it is defined by 64 pixels 

j;{ 15 (vertical) x 64 pixels (horizontal) x 256 gray levels 

;|f (depth), a total data size of 64 x 64 x 8 = 32,768 bits 

= 4,096 bytes is required. That is, if the data size 
of deformation parameters is smaller than this required 
data size, data is to be compressed. For example, in 
2 0 case of rotation deformation, the parameters required 

are the coordinate position (x, y, z) of the center of 
rotation, and rotation angles (9x, 9y, Gz). Since each 
of x, y, and z need only express a value ranging from 0 
to 64, it requires 6 bits. Since each rotation angle 
25 need only express a value ranging from 0 to 360, it 

requires 9 bits in case of integer precision (32 bits 
even in case of floating point precision). That is, in 
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rotation deformation, the total data size required for 
the parameter values is around 45 bits (around 114 bits 
even in case of floating point precision). The same 
applies to deformations other than rotation deformation. 
5 Since this data size is on the order greatly smaller 

than 4,096 bytes before compression, a very high 
compression ratio can be expected in image compression 
by the method of this embodiment. 

In order to decompress the obtained compressed 

10 image, deformed images can be sequentially generated 

using deformation parameter values on the basis of 
actual range image data present at specific frame 
intervals. That is, an image decompression device for 
decompressing compressed image data, which includes 

15 actual range image data present at specific frame 

intervals and motion parameters required for 
reconstructing range images between the actual range 
image data (the compressed image data may be passed 
using a predetermined recording medium such as a floppy 

20 disk or the like or using communications using computer 

networks) must have a mechanism having at least 
functions similar to those of the aforementioned image 
deformation section 2 . 

The conventional image compression method such 

2 5 as MPEG (Motion Picture Experts Group) 1, MPEG2 , MPEG4 , 

or the like independently compresses two-dimensional 
images in units frames or using difference signals from 
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the previous and next frames. By contrast, the present 
invention compresses three-dimensional range images 
using extracted motion parameters unlike in the 
conventional method. 
5 (Sixth Embodiment) 

The sixth embodiment of the present invention will 
be described below. 

FIG. 37 shows an example of the arrangement of 
an image recognition apparatus according to the sixth 

10 embodiment of the present invention. 

The image recognition apparatus of this embodiment 
comprises a communication section 10 for communicating 
with an external apparatus in addition to the arrange- 
ment of the image recognition apparatus of the first, 

15 second, third, fourth, or fifth embodiment. 

With this arrangement, the result obtained in the 
first, second, third, fourth, or fifth embodiment can 
be sent to an external apparatus using a communication 
path such as a telephone line or the like. 

2 0 For example, when the communication section 10 is 

added to the image recognition apparatus of the fifth 
embodiment, only compressed image data described in the 
fifth embodiment is sent, and a receiving apparatus 
decompresses it so that only compressed data with a 

25 small size can be sent onto the communication path 

although the receiving apparatus can obtain range image 
data with a large size. 
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In this way, an increase in data size upon 
communication, that has posed a program in conventional 
moving picture communications, can be avoided, and the 
recognition result of the image recognition apparatus 
5 of the present invention can be effectively sent to 

another place via the Internet or the like. 

For example, even when users at remote places play 
a physical game such as jyanken (a kind of mora or 
a tossup), boxing, or the like, they can exchange 

10 three-dimensional images with each other in real 

time by compressing and communicating captured three- 
dimensional range images of the hand, body, or the like. 
That is, since three-dimensional hand and body data can 
be sent and reconstructed at remote places, the users 

15 can feel, using very low-cost apparatuses, as if they 

were playing the game at that place, thus providing 
great practical effects. 
(Other) 

Note that the aforementioned embodiments and 
20 modifications can be appropriately combined. 

In appropriate combinations of the aforementioned 
embodiments and modifications, the image capture 
section 1 may be omitted, and an apparatus which 
recognizes motions based on input range images or range 
25 image stream, or make various processes based on the 

recognition result may be built. 

The above-mentioned building components can be 



implemented by software except for the image sensing 
unit of the image capture section 1. That is, the 
aforementioned sequences can be recorded on a computer- 
readable recording medium as a program that can be 
executed by a computer, and that medium can be 
distributed* 

The present invention is not limited to the above 
embodiments, and various changes and modification may 
be made within its technical scope. 

Additional advantages and modifications will 
readily occur to those skilled in the art. Therefore, 
the invention in its broader aspects is not limited to 
the specific details and representative embodiments 
shown and described herein. Accordingly, various 
modifications may be made without departing from the 
spirit or scope of the general inventive concept as 
defined by the appended claims and their equivalents. 



WHAT IS CLAIMED IS: 

1. An image recognition method comprising: 
obtaining a deformed image by deforming a captured 

range image; and 

recognizing three-dimensional motion of an object 
in the range image by comparing the obtained deformed 
image with a newly captured range image, 

2. An image recognition method comprising: 
obtaining a deformed image by deforming a captured 

range image; and 

recognizing the presence/absence of three- 
dimensional motion of an object in the range image by 
comparing the obtained deformed image with a newly 
captured range image, and recognizing a series of 
motions recognized from each of a series of a plurality 
of range images . 

3. A method according to claim 2, further 
comprising predicting motion of the object on the basis 
of the series of recognized motions. 

4. A method according to claim 1, further 
comprising predicting motion of the object on the basis 
of a feature amount of the object extracted from the 
captured range image. 

5. A method according to claim 2, further 
comprising predicting motion of the object on the basis 
of a feature amount of the object extracted from the 
captured range image. 
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6. A method according to claim 2, further 
comprising predicting motion of the object on the basis 
of a feature amount of the object extracted from the 
captured range image, and the series of recognized 

5 motions . 

7. A method according to claim 1, further 
comprising the step of compressing a range image 
captured by image capture unit on the basis of the 
recognized motion of the object* 

10 8, A method according to claim 2, further 

comprising compressing a range image captured by image 
capture unit on the basis of the recognized motion of 
the object. 

9, An image recognition apparatus comprising: 
15 an image capture unit configured to capture a 

range image; 

an image deformation unit configured to deform the 
range image captured by said image capture unit; and 

a recognition unit configured to recognize three- 
2 0 dimensional motion of an object by comparing a deformed 

image obtained by said image deformation unit and a new 
range image captured by said image capture unit. 

10. An image recognition apparatus comprising: 
an image capture unit configured to capture a 
25 range image; 

an image deformation unit configured to deform the 
range image captured by said image capture unit; 
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a first recognition unit configured to recognize 
three-dimensional motion of an object by comparing a 
deformed image obtained by said image deformation unit 
and a new range image captured by said image capture 
5 unit; and 

a second recognition unit configured to recognize 
a series of motions recognized from each of a series of 
a plurality of range images by said first recognition 
unit, 

10 11. An image recognition apparatus comprising: 

an image capture unit configured to capture a 
range image ; 

a storage unit configured to store the range image 
captured by said image capture unit; 

15 an image deformation unit configured to deform a 

designated range image of the range image captured by 
said image capture unit and/or the range image stored 
in said storage unit; and 

a recognition unit configured to recognize the 

20 presence/absence of three-dimensional motion of an 

object by comparing one of a deformed image obtained by 
said image deformation unit and the range image stored 
in said storage unit, and a new range image captured by 
said image capture unit* 

25 12, An image recognition apparatus comprising: 

an image capture unit configured to capture 
a range image; 
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a storage unit configured to store the range image 
captured by said image capture unit; 

an image deformation unit configured to deform a 
designated range image of the range image captured by 
said image capture unit and/or the range image stored 
in said storage unit; 

a first recognition unit configured to recognize 
three-dimensional motion of an object by comparing one 
of a deformed image obtained by said image deformation 
unit and the range image stored in said storage unit, 
and a new range image captured by said image capture 
unit; and 

a second recognition units configured to recognize 
a series of motions recognized from each of a series of 
15 a plurality of range images by said first recognition 

unit . 

13. An apparatus according to claim 10, further 
comprising: 

a prediction unit configured to predict motion of 
2 0 the object on the basis of the series of motions 

recognized by said second recognition unit. 

14. An apparatus according to claim 11, further 
comprising: 

a prediction unit configured to predict motion 
25 of the object on the basis of the series of motions 

recognized by said second recognition unit. 

15. An apparatus according to claim 12, further 



==s? 



- 64 - 

s comprising: 

a prediction unit configured to predict motion 
of the object on the basis of the series of motions 
recognized by said second recognition unit, 
5 16. An apparatus according to claim 9, further 

comprising: 

a feature amount extraction unit configured to 
extract a feature amount of the object from the range 
Q image captured by said image capture unit; and 

sp 10 a prediction unit configured to predict motion of 

y, the object on the basis of the feature amount extracted 

by said feature amount extraction unit, 

17. An apparatus according to claim 10, further 
comprising: 

!5 a feature amount extraction unit configured to 

extract a feature amount of the object from the range 
image captured by said image capture unit; and 

a prediction unit configured to predict motion of 
the object on the basis of the feature amount extracted 
2 0 by said feature amount extraction unit, 

18. An apparatus according to claim 11, further 
comprising: 

a feature amount extraction unit configured to 
extract a feature amount of the object from the range 
2 5 image captured by said image capture unit; and 

a prediction unit configured to predict motion of 
the object on the basis of the feature amount extracted 
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by said feature amount extraction unit* 

19. An apparatus according to claim 12, further 
comprising: 

a feature amount extraction unit configured to 
5 extract a feature amount of the object from the range 

image captured by said image capture unit; and 

a prediction unit configured to predict motion of 
the object on the basis of the feature amount extracted 
by said feature amount extraction unit. 
10 20. An apparatus according to claim 10, further 

comprising: 

a feature amount extraction unit configured to 
extract a feature amount of the object from the range 
image captured by said image capture unit; and 
15 a prediction unit configured to predict motion of 

the object on the basis of the feature amount extracted 
by said feature amount extraction unit, and the series 
of motions recognized by said second recognition unit. 
21. An apparatus according to claim 11, further 
20 comprising: 

a feature amount extraction unit configured to 
extract a feature amount of the object from the range 
image a captured by said image capture unit; and 

a prediction unit configured to predict motion of 
2 5 the object on the basis of the feature amount extracted 

by said feature amount extraction unit, and the series 
of motions recognized by said second recognition unit. 
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22. An apparatus according to claim 12, further 
comprising: 

a feature amount extraction unit configured to 
extract a feature amount of the object from the range 
5 image captured by said image capture unit; and 

a prediction unit configured to predict motion of 
the object on the basis of the feature amount extracted 
by said feature amount extraction unit, and the series 
of motions recognized by said second recognition unit. 
10 23. An apparatus according to claim 9, further 

comprising: 

an image compression units configured to compress 
the range image captured by said image capture unit on 
the basis of the recognized motion of the object. 
15 24. An apparatus according to claim 13, further 

comprising: 

an image compression unit configured to compress 
the range image captured by said image capture unit on 
the basis of the recognized motion of the object. 
20 25. An apparatus according to claim 11, further 

comprising: 

an image compression unit configured to compress 
the range image captured by said image capture unit on 
the basis of the recognized motion of the object. 
2 5 2 6. An apparatus according to claim 12, further 

comprising: 

image compression unit configured to compress the 
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range image captured by said image capture unit on the 
basis of the recognized motion of the object • 

21. An article of manufacture comprised of a 
computer-usable medium having computer-readable program 
5 code means that implements computer-readable program 

code means for recognizing an image , comprising: 

computer-readable program code means for making 
a computer capture a range image; 
O computer-readable program code means for making 

m 

^: 10 the computer deform the range image captured by the 

N 

M image capture means; and 

Jq computer-readable program code means for making 

the computer recognize the presence/absence of three- 
^ dimensional motion of an object by comparing a deformed 

15 image obtained by the image deformation means and a new 

^ range image captured by the image capture means* 

28. An article of manufacture comprised of a 
computer-usable medium having computer-readable program 
code means that implements computer-readable program 
20 code means for recognizing an image, comprising: 

computer-readable program code means for making 
a computer capture a range image; 

computer-readable program code means for making 
the computer deform the range image captured by the 
25 image capture means; 

computer-readable program code means for making 
the computer recognize the presence/absence of 
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three-dimensional motion of an object by comparing a 
deformed image obtained by the image deformation means 
and a new range image captured by the image capture 
means ; and 

5 computer-readable program code means for making 

the computer recognize a series of motions recognized 
from a series of a plurality of range images by the 
recognition means* 

; ; 

r* 
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ABSTRACT OF THE DISCLOSURE 
An image recognition apparatus has an image 
capture section for capturing a range image, an image 
deformation section for deforming the range image 
5 captured by the image capture section, and a recogni- 

tion section for recognizing the presence/absence of 
three-dimensional motion of an object by comparing 
the deformed image obtained by the image deformation 
section, and a new range image captured by the image 
10 capture section. 
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